Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulate blocking behavior using non-blocking on networking #13318

Merged
merged 7 commits into from
Sep 4, 2020

Conversation

anr2me
Copy link
Collaborator

@anr2me anr2me commented Aug 21, 2020

An attempt to improve multiplayer performance/reduce lags/stutters due to blocking behavior.
This should fixes games that get heavy performance degradation (ie. Power Stone Collection/2), and games that almost permanently frozen (ie. Hotshots Tennis, Digimon World Re:Digitize, etc).
Also bumped into possible crash related to AdhocMatching while testing this, thus fixing it along with this.

PS: This is my first time using ScheduleEvent so there are many questions and assumption :)

Here are the test builds:
Win32 & Win64: https://www.dropbox.com/s/kbwwx0q0w4ecm2d/PPSSPP_1.10.3-perftest_Win32x64.zip?dl=0
Android ARMv7 32bit: https://www.dropbox.com/s/3gl942pt5n91fer/PPSSPP_1.10.3-perftest_ARMv7.apk?dl=0

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 21, 2020

Apparently this will cause connection issue with more than 2 ms latency when i tried to simulate high latency using clumsy https://github.com/jagt/clumsy/ :( and 2 ms is not an acceptable latency limit even on LAN, so i'll probably need to retesting these.

Increasing the MinTimeout to 100ms doesn't seems to have any effect to this issue, while it should.. if latency was the real issue

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 21, 2020

Alright, this should be a lot more stable, even without increasing the MinTimeout :)

@hrydgard
Copy link
Owner

Nice stuff. Will look at this one tomorrow!

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 21, 2020

I'll probably need to create several sub functions as the event handler is getting too long :(

@boy13epic
Copy link

i think this might break games

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 22, 2020

Now i'm facing with another issue :( changing MinTimeout to 200ms on Power Stone Collection will cause the FPS to drop from 60 FPS to a slideshow of 5 FPS (even though it was running at 99.8%) when using non-blocking socket while it doesn't have any impact on blocking socket (as long the actual latency is lower than 200ms).
When i checked the logs it seems recv always reaching that 200ms timeout on a freaking localhost! that's just weird :(

Actually i'm always wondering why it took more than 10 seconds to "connect" with non-blocking socket on a freaking LAN since months ago when testing non-blocking on Dissidia 012, while it only took a few milliseconds with blocking socket.
Looks like i'll need to investigate the socket...

Edit: on Power Stone Collection it seems both sides only recv without sending anything that's why it's always getting timeout, but it seems to only happened when simulating blocking behavior using non-blocking socket

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 22, 2020

After comparing the logs, looks like the buggy one was the MinTimeout implementation (the one that tries to simulate blocking within sceNetAdhocPtpRecv instead of using Scheduled Events, didn't make use the MinTimeout value)

Power Stone Collection/2 seems to send on 3 seconds interval while polling with (blocking?!) recv on 10-20 msec interval with 100 usec timeout.
Overriding the timeout to 200ms is expected to cause the recv to have at least 200ms interval and getting timeouts several times within 3 seconds, so this blocking simulation using Scheduled Events is already correct.

However, that game is doing this on a separated PSP thread(networkThread) than the main thread that do the rendering, so i'm confused why it's affecting/dropping the FPS if it was running on a different PSP thread and showing 99.8% performance ?

PS: I even tried to increase the delay on ScheduleEvent to 50ms and it still causing the FPS to drops from 60 FPS to 5 FPS :(

CoreTiming::ScheduleEvent(usToCycles(50000) - cyclesLate, adhocSocketNotifyEvent, userdata);

So how can i improve the FPS?
Is this happened because the events is running on idle0 thread? is this supposed to happen when using Scheduled Events?

PPS: Apparently increasing the event delay to 50ms will also cause FPS drops even with MinTimeout=1 due to the sceNetAdhocPtpRecv is blocking the networkThread for too long.
Does the main thread also gets blocked when an event is scheduled where CurrentThread is the networkThread?

@ghost
Copy link

ghost commented Aug 22, 2020

After comparing the logs, looks like the buggy one was the MinTimeout implementation (the one that tries to simulate blocking within sceNetAdhocPtpRecv instead of using Scheduled Events, didn't make use the MinTimeout value)

Power Stone Collection/2 seems to send on 3 seconds interval while polling with (blocking?!) recv on 10-20 msec interval with 100 usec timeout.
Overriding the timeout to 200ms is expected to cause the recv to have at least 200ms interval and getting timeouts several times within 3 seconds, so this blocking simulation using Scheduled Events is already correct.

However, that game is doing this on a separated PSP thread(networkThread) than the main thread that do the rendering, so i'm confused why it's affecting/dropping the FPS if it was running on a different PSP thread and showing 99.8% performance ?

PS: I even tried to increase the delay on ScheduleEvent to 50ms and it still causing the FPS to drops from 60 FPS to 5 FPS :(

CoreTiming::ScheduleEvent(usToCycles(50000) - cyclesLate, adhocSocketNotifyEvent, userdata);

So how can i improve the FPS?

TBH its not that bad with that game because only the menus are slow and ingame the game is full speed.
Better check if other games react good too.

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 22, 2020

After comparing the logs, looks like the buggy one was the MinTimeout implementation (the one that tries to simulate blocking within sceNetAdhocPtpRecv instead of using Scheduled Events, didn't make use the MinTimeout value)
Power Stone Collection/2 seems to send on 3 seconds interval while polling with (blocking?!) recv on 10-20 msec interval with 100 usec timeout.
Overriding the timeout to 200ms is expected to cause the recv to have at least 200ms interval and getting timeouts several times within 3 seconds, so this blocking simulation using Scheduled Events is already correct.
However, that game is doing this on a separated PSP thread(networkThread) than the main thread that do the rendering, so i'm confused why it's affecting/dropping the FPS if it was running on a different PSP thread and showing 99.8% performance ?
PS: I even tried to increase the delay on ScheduleEvent to 50ms and it still causing the FPS to drops from 60 FPS to 5 FPS :(

CoreTiming::ScheduleEvent(usToCycles(50000) - cyclesLate, adhocSocketNotifyEvent, userdata);

So how can i improve the FPS?

TBH its not that bad with that game because only the menus are slow and ingame the game is full speed.
Better check if other games react good too.

Power Stone actually runs smoothly with this blocking simulation using Scheduled Events... as long the MinTimeout are not changed to a higher value that is.

I don't know any other games that have significant performance drop like what Power Stone had, most of the games i tested runs normal.
Do you know any other games that runs as slow as Power Stone?

PS: This is the scene which usually have very low FPS on Power Stone, runs smoothly with this PR (as long i use MinTimeout=1)
image

@ghost
Copy link

ghost commented Aug 22, 2020

Instead of games that go too slow I know that there are also games that work too fast like need for speed and burnout.
And Sega Genesis Collection works too slow if you have WLAN on as well but I dont know if its related.

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 22, 2020

Instead of games that go too slow I know that there are also games that work too fast like need for speed and burnout.
And Sega Genesis Collection works too slow if you have WLAN on as well but I dont know if its related.

LOL as far as i know networking can only slow things down and not making it faster >.< so if it runs faster it probably have something to do with the PPSSPP engine it self.

Anyway, the only explanation i can think of from Power Stone issue is that the game might just have a bad networking code.
For example, assuming the game is doing a lock before calling a blocking sceNetAdhocPtpRecv and unlocks it after that call is done on networkThread, while the main thread is also waiting for that lock and ended waiting for the whole blocking process, which defeats the purpose of running blocking network code on a separate thread.

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 22, 2020

TBH its not that bad with that game because only the menus are slow and ingame the game is full speed.
Better check if other games react good too.

Here are the test builds if you want to tests this over the internet.
Win32 & Win64: https://www.dropbox.com/s/kbwwx0q0w4ecm2d/PPSSPP_1.10.3-perftest_Win32x64.zip?dl=0
Android ARMv7 32bit: https://www.dropbox.com/s/3gl942pt5n91fer/PPSSPP_1.10.3-perftest_ARMv7.apk?dl=0

@anr2me anr2me force-pushed the net_blocking branch 2 times, most recently from 928b4da to e21f8bb Compare August 23, 2020 17:21
@ghost
Copy link

ghost commented Aug 23, 2020

Warriors Orochi 2 : in-game when playing on adhoc would be too slow with multiple instances of PPSSPP.
Didn't try over the internet.
Timeout option wont affect FPS.
This PR fixes the game though.

Copy link
Collaborator

@unknownbrackets unknownbrackets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any FPS drop, is the game FPS down or is PPSSPP having trouble?

The way to tell is VPS, if we're still flipping as often, then yeah - we're delaying something too much, or maybe even not enough (allowing the game to get stuck in a loop that doesn't reschedule and cause another PSP thread to run.)

-[Unknown]

}

if (p.mode == p.MODE_READ) {
// Reallocate if it wasn't the correct memory block
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely shouldn't need to do this, unless scanInfosAddr is 0. The memory blocks are all loaded as well, although you shouldn't depend on the sequence of loading.

For example, PSPNetconfDialog::DoState may actually be called before userMemory.DoState (I think this is not the case, but I don't think the code should depend on the order.) You really should not interact with userMemory within this function at all.

Once all DoState functions finish, you can depend on userMemory looking exactly like it did when the user saved state. So the address will still be valid if it was valid before.

When a save state is "upgraded" then you need a fallback, but that's it. And in that case, scanStep will also be 0 so we should be safe.

-[Unknown]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, does kernelMemory blocks also saved just like userMemory? i remembered long time ago i use kernelMemory for the loop code instead of userMemory, may be i should change that to userMemory too

@@ -200,6 +203,15 @@ SceNetAdhocctlPeerInfo * findFriend(SceNetEtherAddr * MAC) {
return peer;
}

int getNonBlockingFlag(int fd) {
#ifdef _WIN32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, you could use ioctlsocket. What is this used for?

-[Unknown]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently switching the socket blocking & non-blocking.

From what i read at stackoverflow, on Windows there is no way to get the blocking state of a socket, so this getNonBlockingFlag is kinda broken on windows, and i guess it won't be needed anymore since this PR is simulating the blocking process so we can assume it will always be in non-blocking mode.

@@ -2075,7 +2075,7 @@ int server_loop(int server)
sleep_ms(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be a select() and there should be no sleeps.

In case it's helpful, see this code that does a bunch of non-blocking socket operations:
https://github.com/hrydgard/ppsspp/blob/master/ext/native/net/websocket_server.cpp

-[Unknown]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was the old code from aemu project, i didn't touch the logic much

actually, select with timeout will also sleeping under the hood, since it's blocking the thread :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but performance should be better with select() since it'll act on new network data much quicker.

-[Unknown]


SceUID waitID = __KernelGetWaitID(threadID, WAITTYPE_NET, error);
if (waitID == 0 || error != 0)
return; // FIXME: Is it safe to exit here like this without re-scheduling the event? Will this event be triggered again? What will happen to the result i might want to change if exited here?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it won't automatically. But if the thread is not waiting no that, something has gone wrong and you probably don't want it to.

Most common cause would be the game calling sceKernelReleaseWaitThread().

-[Unknown]

Copy link
Collaborator Author

@anr2me anr2me Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During my test it was woken-up by hleDelayResult on the AdhocMatching loop hle, so i'm replacing it with sceKernelDelayThread and didn't get woken-up randomly anymore.

What i want to know it, in the case the waiting sceNet function got woken up by something else, what kind of return value will it get? was it the normal return value from that function will be used?

For example:

int sceNetSomething(){
...
	CoreTiming::ScheduleEvent(usToCycles(100), adhocSocketNotifyEvent, param);
	__KernelWaitCurThread(WAITTYPE_NET, socketId, 0, 0, false, reason);

	// Fallback return value?
	return ERROR_NET_ADHOC_TIMEOUT;
}

i assume the ERROR_NET_ADHOC_TIMEOUT will be used as return value in the case the thread got woken up unexpectedly, is that a correct assumption?

btw, waitID == 0 also mean that the thread got woken up unexpectedly right? so we can safely exit without causing the sceNet keeps waiting and never exit the function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "normal" return value is discarded. Nothing will use that.

SCE_KERNEL_ERROR_RELEASE_WAIT is the return value if sceKernelReleaseWaitThread is used. If you use hleDelayResult on a thread that was already waiting on something else, its new result will be whatever you specified in the hleDelayResult() call.

__KernelGetWaitID() actually returns -1 on error, so technically 0 just means you specified 0 as the wait id in __KernelWaitCurThread() or etc.

PSP threads won't be woken unexpectedly. There's no such thing as spurious wakeup in our HLE thread management.

-[Unknown]

if (waitID == 0 || error != 0)
return; // FIXME: Is it safe to exit here like this without re-scheduling the event? Will this event be triggered again? What will happen to the result i might want to change if exited here?

int waitVal = __KernelGetWaitValue(threadID, error); // FIXME: Is this value going to be a valid value if waitID == 0? or it's a value belonged to other event?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you got an invalid wait ID, it could be this thread could even be waiting on something totally different now. So no, the waitVal is probably not interesting in that case.

-[Unknown]

result = 0; // Faking successfully connected to adhoc server
}

//HLEKernel::ResumeFromWait(threadID, WAITTYPE_NET, uid, result); // FIXME: This won't do anything if waitID == 0, not sure what kind of value returned from the HLE which i might want to change here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it seems like it'd be better to put the desired adhocctlState in waitID, and keep the waitVal for the return value. Might be tricky for save states.

You could also keep an unordered_map of thread id -> return value, and store the value there when scheduling the wait.

-[Unknown]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well i'm trying to use something unique as waitID, since windows socket is kinda unique and there is only one socket that communicate with adhoc server so i'm using it, it's also easier to use compared to create yet another global vector/map :D

if (ret >= 0) {
DEBUG_LOG(SCENET, "sceNetAdhocPdpSend[%i:%u](B): Sent %u bytes to %s:%u\n", uid, getLocalPort(pdpsocket->id), ret, inet_ntoa(target.sin_addr), ntohs(target.sin_port));
// Remove successfully sent to peer to prevent sending the same data again during a retry
peer = targetPeers.peers.erase(peer);
Copy link
Collaborator Author

@anr2me anr2me Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried with this erase, i wanted to erase the content of the global map, but not sure whether this code will affect globally or this peers is only a copy?

i only want to retry sending on failed peers and not resending to the one already successful on the next rescheduled event

do i need to reference the peers first using auto& peers = targetPeers.peers or can i use it directly from the referenced parent (targetPeers) like this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

targetPeers is a reference, so any changes to targetPeers.peers will affect that reference.

foo.erase() mutates foo, you'd need to do std::deque<AdhocSendTarget>(foo).erase or something to make it use a copy, and even then peer would not be a valid iterator to the copy, I assume.

-[Unknown]

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 24, 2020

Warriors Orochi 2 : in-game when playing on adhoc would be too slow with multiple instances of PPSSPP.
Didn't try over the internet.
Timeout option wont affect FPS.
This PR fixes the game though.

Btw is there any regression on games that was working fine before?

PS: If you are using an older version of test build, you might want to redownload the test build since i updated it yesterday to implement blocking simulation on PdpSend

@ghost
Copy link

ghost commented Aug 24, 2020

Warriors Orochi 2 : in-game when playing on adhoc would be too slow with multiple instances of PPSSPP.
Didn't try over the internet.
Timeout option wont affect FPS.
This PR fixes the game though.

Btw is there any regression on games that was working fine before?

PS: If you are using an older version of test build, you might want to redownload the test build since i updated it yesterday to implement blocking simulation on PdpSend

Yes DBZ Tag Team would sometimes disconnect in the middle or the beginning of the battle.
I dont remember disconnections before this PR.
Ratchet and Clank Size Matters didnt exactly worked in-game but now it regressed even further because I cant go further in the menus when I try to join.
Tried both with multiple instances and timeout = 1.

@cahvireyu
Copy link

I can't see other players on Power Stone Collection using the test build but I can using automated builds. I'm using Zerotier on 2 Android devices. Anyway thank for your work.

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 26, 2020

Warriors Orochi 2 : in-game when playing on adhoc would be too slow with multiple instances of PPSSPP.
Didn't try over the internet.
Timeout option wont affect FPS.
This PR fixes the game though.

Btw is there any regression on games that was working fine before?
PS: If you are using an older version of test build, you might want to redownload the test build since i updated it yesterday to implement blocking simulation on PdpSend

Yes DBZ Tag Team would sometimes disconnect in the middle or the beginning of the battle.
I dont remember disconnections before this PR.
Ratchet and Clank Size Matters didnt exactly worked in-game but now it regressed even further because I cant go further in the menus when I try to join.
Tried both with multiple instances and timeout = 1.

I just realized there is a bug with communication to the adhoc server while watching the logs on KHBBS, sometimes the Ping is getting socket error 10053 and got disconnected from adhoc server, so may be the disconnection issue was related to that bug.
If it lost connection to the adhoc server sometimes you won't be able to see each other, since other players are thinking you're not playing that game anymore.

Well i've fixed it on my side, will push it later after i'm done removing memory allocation from savesate, probably on weekend since i need to prepare for a job interview first.

@ghost
Copy link

ghost commented Aug 31, 2020

There is no graphical issues it's only the multiplayer that got that issue lol.
Single player works fine.
It might not be adhoc related dunno.
So far everything works OK from what I seen with other games.

@zakilj3
Copy link

zakilj3 commented Aug 31, 2020

I'd say that its probably the multiplayer mode itself that is bugged and not the connection (at least not with multiple instance on pc)

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 31, 2020

In my case it even crashed after choosing "Multiplayer Mode" which doesn't do anything network related yet (no network initialized yet), it crashed in the middle of the rocket flying animation LOL

Anyway, did you guys still seeing "Network Initialized" message on screen in the middle of multiplayer on any game?
If you do, let me know how to reproduce it.

@anr2me
Copy link
Collaborator Author

anr2me commented Aug 31, 2020

Updated to fix an issue with Naruto Shippuden Ultimate Ninja Heroes 3 that got broken on previous changes.

Also added a 5 seconds timeout on NetconfDialog to prevent it from retrying to connect/create/join forever when loosing connectivity to the adhoc server.
It will shows something like this when getting timedout with error code taken from the failing sceNet:
image

PS: Test builds have been updated

@anr2me
Copy link
Collaborator Author

anr2me commented Sep 4, 2020

After rebasing this, it seems Kingdom Hearts BBS have a working Versus and Arena mode now O.o
image
image

PS: Windows Test builds have been updated.

…ceUtilityNetconfGetStatus()" after "0=sceUtilityNetconfShutdownStart()" instead of directly to "0=sceUtilityNetconfGetStatus()" (just like what happen when using utility.prx file with KHBBS on JPCSP)
…ver sometimes getting socket error 10053 and disconnected from Adhoc Server
…o the one from prx file (based on Kingdom Hearts BBS)
@hrydgard
Copy link
Owner

hrydgard commented Sep 4, 2020

Ooh, very cool!

Is this branch ready to go now?

@zakilj3
Copy link

zakilj3 commented Sep 4, 2020

Interesting, I should try it with 2 windows and also with android(but I feel there will likely be issues with Android just like with bleach and some other games)

@anr2me
Copy link
Collaborator Author

anr2me commented Sep 4, 2020

Ooh, very cool!

Is this branch ready to go now?

I think so, i haven't found anymore issue so far, at least no regression.

@anr2me
Copy link
Collaborator Author

anr2me commented Sep 4, 2020

Interesting, I should try it with 2 windows and also with android(but I feel there will likely be issues with Android just like with bleach and some other games)

Android build is not updated yet, i'm still building it.

Edit: Android Test build has been updated now.

@ghost
Copy link

ghost commented Sep 4, 2020

This got anything to do with the Gran Turismo fix?
Maybe its the key to fix the coop mode too which is the only one that got left lol.

@anr2me
Copy link
Collaborator Author

anr2me commented Sep 4, 2020

This got anything to do with the Gran Turismo fix?
Maybe its the key to fix the coop mode too which is the only one that got left lol.

High probability, because that PR fixed a lot of things on games that use AdhocMatching, while this PR is improvements with side effects of fixing games that was frozen due to blocking socket that blocks the CPU emulation.

Btw, which one is the coop mode? i only tested it on Versus and Arena mode

@ghost
Copy link

ghost commented Sep 4, 2020

This got anything to do with the Gran Turismo fix?
Maybe its the key to fix the coop mode too which is the only one that got left lol.

High probability, because that PR fixed a lot of things on games that use AdhocMatching, while this PR is improvements with side effects of fixing games that was frozen due to blocking socket that blocks the CPU emulation.

Btw, which one is the coop mode? i only tested it on Versus and Arena mode

I fogot that Arena is the coop mode lol.
Yeah it means all the important modes work now.
Just need to try it over the internet.

@anr2me
Copy link
Collaborator Author

anr2me commented Sep 4, 2020

Interesting, I should try it with 2 windows and also with android(but I feel there will likely be issues with Android just like with bleach and some other games)

This even works better than JPCSP with prx files LOL with JPCSP after you're done and return to mirage area, one of the player (the joining one) is always got kicked out from mirage arena or frozen, while with PPSSPP it can go back to mirage area successfully without any issue.

Regarding Bleach 7, i feels there is timing issue because the host need to accept joining player as soon as possible, delaying for a short time will automatically cancel the join request, and it only happened on P2P mode, while Host/Join mode worked without problem.
Well at least one of multiplayer (Host/Join mode) on Bleach 7 is working properly now :D

@zakilj3
Copy link

zakilj3 commented Sep 4, 2020

I ve tested Android with pc, both vs and arena mode works flawly I ll be infinitely grateful to U ANR2ME

@anr2me anr2me deleted the net_blocking branch September 4, 2020 07:39
@Lemoncak3
Copy link

Phantasy star portable 2 have worked wonderfully again now. I hope you will have a way to fix another multiplayer game too. Thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants